Wikipedia:Wikipedia Signpost/2012-05-07/WikiProject report
Say What?: WikiProject Languages
This week, we conversed with WikiProject Languages. Started in October 2003, the project has grown to include over 7,000 pages maintained by a lengthy list of volunteers. The project provides a template for starting new articles about languages, maintains an infobox, and works on a variety of open tasks including a subproject to ensure that an article exists for every language that has been assigned an ISO 639-3 language code. WikiProject Languages is descended from WikiProject Linguistics and is a parent of WikiProject Endangered Languages and WikiProject Latin. We interviewed G Purevdorj, Maunus, and Angr.
What motivated you to join WikiProject Languages? What languages do you know?
- G Purevdorj: I was studying linguistics with a typological perspective and could thus relate to WP Languages.
- Maunus: I was mostly editing language articles when I first started. I am a linguist by profession specializing in Mesoamerican languages. I speak Danish, English, and Spanish fluently. I have worked professionally with the Nahuatl and Otomi indigenous languages of Mexico and have done field work on both of them. I have been taught and have basic grammatical knowledge of Kalaallisut (West Greenlandic), Latin and German. I have basic knowledge of the historical linguistics of Indo-European, Uto-Aztecan, Mayan, and Mixe-Zoquean languages. I read French and Portuguese with a dictionary.
- Angr: I've always been interested in languages, I have a Ph.D. in linguistics, and my original motivation for joining Wikipedia was to add information about Irish, so joining WikiProject Languages was a very natural step for me to take. I know English and German best, and can converse in French. I have a very good theoretical knowledge of Irish, but couldn't carry on a conversation in it beyond the most simple things. For Welsh, both my theoretical knowledge and my practical knowledge are rather less than for Irish. I can read Spanish, Portuguese, Italian, and Dutch without too much difficulty if I have a dictionary. I have a good theoretical knowledge of Latin and Ancient Greek, and a fair theoretical knowledge of the other ancient Indo-European languages. I have a fairly sound (no pun intended) knowledge of how Burmese phonology works, but I can't actually say anything in it besides Aung San Suu Kyi and even then I can't get the tones right.
How well are languages covered on Wikipedia? Are some better covered than others are? How does the coverage of languages vary across the versions of Wikipedia written in languages other than English?
- G Purevdorj: We have a huge number of stubs created by a few users that contain basic classificational and geographic information. The 1500 start class articles and a subset of the stubs provide more context and some information on phonology, grammar, or sociolinguistics, but never on more than two (sub)areas in any detail. Only very few stub or start class articles are actually concise sketches that strive for any form of overall coverage. Most articles on better-known languages have larger articles from C class upwards, but some of the better articles belong to else less-known languages. English Wikipedia tends to be more extensive than the other Wikipedias except for articles on regional varieties.
- Maunus: Not well at all. The coverage is haphazard depending on the specific expertise of particular editors. Even some of the world's major languages have very bad articles. Some of the best articles we have are about tiny exotic languages like Nafaanra or Tashlhiyt. Some editors like User:Kwamikagami are struggling valiantly to make sure that most of the world's languages are represented at least with stubs and organized in relation to what is known about their family relations. But it is a very large task and it requires editors with expertise in each language group to be done well. Many language groups have only one or two experts doing work on them and often they are unlikely to contribute to Wikipedia. That said there are a number of professional linguists that are Wikipedia editors and who have done great work on the languages they specialize in – User:A R King (Pipil grammar), User:Davidjamesbeck (Totonacan languages, Upper Necaxa Totonac), User:Taivo (Numic languages, Timbisha language, Ukrainian language), User:Lavintzin (Tetelcingo Nahuatl), User:stevemarlett (Seri language), User:G.broadwell (Choctaw language, Chatino language), User:Blillehaugen (Zapotec languages), and User:G Purevdorj (Mongolian language) are some of the professional linguists that I know have made excellent contributions to the languages they work with professionally. User:Miskwito have also done excellent work on Ojibwe language and Ottawa dialect (I think he is not a professional yet, but I'm pretty sure he will become one). I am mostly familiar here with work done on Native American languages – but I am sure there are more professionals working on languages in the rest of the world.
- Angr: Obviously better-known languages are better covered than less-known languages. The availability of reliable sources pretty much forces that to be the case. It's much easier to find sources about a language spoken by millions of people that have official or quasi-official status in the country where it's spoken than it is to find sources about a language spoken by a few dozen people. For example, I don't think our article about Sentinelese can ever be significantly expanded unless the facts on the ground change.
Has the project borrowed anything from other language versions of Wikipedia? How much overlap do you see in editors working on multiple language versions of Wikipedia?
- G Purevdorj: The few articles on different language versions that I have read read rather differently. Not too much overlap, but rather different perspectives.
- Maunus: No, not really. I've seen some of our articles being translated into other languages, but not by any of the editors who work with languages here. I don't generally contribute to other language wikis.
Are there any elements of a language that are difficult to cover in encyclopedia articles?
- Maunus: It takes very dedicated and diligent work to give decent coverage of grammar, most language articles stop with the description after a giving phonemic inventory and basic typological information. Describing the grammar requires a high degree of expertise and writing skills since grammars are often written in highly theoretical and technical ways that needs to be translated into something that is accessible for readers that are not specialists or who want information on the language that is not dependent on a particular linguistic theory. It also requires having a full view of the literature on the language that often spans a couple of centuries and being able to weigh the different descriptions for accuracy and relevance. For many languages there is no complete grammar and there are only highly specialized articles on for example particular aspects of syntax and phonology — in such cases using the literature is difficult because it is too specialized to paint a full picture of the language and the specialized information doesn't do much good on its own without background information.
- Angr: Some aspects of language, such as sentence-level intonation, are extremely difficult to explain using words alone. Sound recordings help, but they're few and far between, and have limitations of their own (such as being impossible to edit). Otherwise, most aspects of language can in principle be covered in an encyclopedia article, provided the sources are available.
Have you contributed to any pronunciation guides or recordings? How widespread is the use of pronunciation recordings in articles? What can be done to improve this?
- G Purevdorj: Well, pronunciation guides would not be in the spirit of a lexicon. Phonology chapters exist in various lengths, and some are quite good. But otherwise, the integration of small sound files into articles would probably be doable and useful. This has been done very rarely only, I think for Swedish ... Still, quite a lot of work.
- Maunus: I've made exactly one recording, and I am not planning to do any more. It might be a good idea to have more of that but since I mostly work on small and endangered languages it is not on the top of my to-do list. Perhaps I should upload some of my recordings of the languages I've worked on.
Have you worked with any of WikiProject Languages' descendent projects, WikiProject Endangered languages, and WikiProject Latin? What are the biggest challenges faced by these projects? Is there collaboration among the projects?
- G Purevdorj: Most languages are endangered, so the coverage of WP EL and WP LANG should be about the same, thus we don't need both.
- Maunus: I was among the ones to start the Endangered Languages Project. Unfortunately, it is sort of inactive now. The biggest challenge is to remain active — many editors work on one language only and, when they are satisfied with the coverage, they stop. There is very little collaboration among projects.
- Angr: I'm a member of both those descendent projects, but neither of them is terribly active. The biggest challenge faced by these projects is probably editor apathy, and the fact (which applies to the main languages project as well) that editors tend to work alone rather than in collaboration.
The project has 7 Featured Articles and 12 Good Articles. Have you contributed to any of these? Have you learned any important lessons while working to promote an article to FA or GA status?
- G Purevdorj: Created one article to GA, copyedited and promoted one article to GA, and demoted a number of older articles from GA. But no, I don't think I learnt a lot in the course of doing so. Basically, all linguistics plus rating criteria that must be fulfilled. Quite nice to know about picture descriptions for FA, for example, and to create those, but such knowledge is just about the system here rather than about general editing of language-related topics.
- Maunus: I was the main contributor to Nahuatl (with a lot of help from User:CJLLWright, User:A R King and User:Lavintzin), to Mayan languages (with a lot of help from User:CJLLWright, User:Homunq and User:Madman2001) and to Greenlandic language (with a lot of help from User:G Purevdorj). When I was trying to get Otomi language to FA, I realized it wasn't worth the effort. I was very disheartened by the FA process, which I found mostly consisted in people chastising the nominator and making demands for including types of information that was irrelevant or non-existent. I would have preferred a more collaborative spirit where the reviewers see the potential of the article and work with the nominator to improve it and make it as good as it can be. I don't mind demanding or even snarky reviewers if I submit an article to a journal, but in that case I have a personal stake in improving the content that I don't have here. This is something I am doing in my free time as a volunteer, I don't need people yelling at me or telling me what to do when I work as a volunteer. I am happy to accept constructive criticism and advise if offered in a collaborative spirit — but that wasn't what I met at FAC. If I publish an article my career advances, that makes it worth the while to have to deal with unreasonable reviewers. But if I get an article to FA I get a gold star I can put on my userpage. That's not worth the hassle, and I'd rather focus on publishing articles then. In my view, FA depends on creating an amiable collaborative environment. I do have a plan about getting Language to FA status, but that is a huge amount of work and I am not going to do that unless I have at least two or three co-nominators.
- Angr: I was the main contributor for Irish phonology. Bringing it up to FA status was a very satisfying experience, but I can understand some of Maunus's frustration. It's difficult dealing with reviewers who have no understanding of the topic you're writing about, and often their suggestions seem irrelevant or uninformed. The writing itself is difficult because it's extremely hard to write on such an arcane subject as Irish phonology in a way that can be understood by the average reader. As the writer, you don't want to have to walk the readers through a virtual introductory linguistics course and introductory phonology course before you can get to the matter at hand, but neither do you want 99.9% of your readers (especially on Today's Featured Article day) scratching their heads in puzzlement before they get to the end of the first sentence. It's a lot of work, and something I was only able to do while I was unemployed and could spend pretty much all day doing research and writing the article. Now that I work 40 hours a week again, there's no chance of my bringing another article up to featured status.
What are the project's most pressing needs? How can a new contributor help today?
- G Purevdorj: Of course, there is a vast number of languages that don't have any reasonable coverage. Turning stubs or start class article into C class articles that include a reasonable share of overall information on grammar and sociolinguistics is presumably much more useful to the average reader and would make a lot of sense. If referencing is good, building up articles to C class might have the best working time / gain ratio. The other thing is articles on linguistic theory that are overall very poor. But non-linguists can't do that, and most linguists just don't invest the time.
- Maunus: We need ever more contributors with expert knowledge of particular languages. Language articles cannot be written to reasonable standards by editors without expert knowledge (i.e. both knowledge of linguistics and of the language (being a native speaker is rarely enough)). Editors without expert knowledge could participate by expanding the thousands of stubs that we have on minor languages — information to write a decent C class article is often readily available from sites such as [www.ethnologue.com], [www.wals.org], [1] or from specialized websites about the languages. Information such as where spoken, number of speakers, language family and whether the language is considered endangered or not is often readily available and does not require specialized knowledge.
- Angr: Obviously we need more experts, especially ones who can read languages in which descriptions of tiny languages may have been written. For example, there are dozens if not hundreds of small languages in the former Soviet Union, but if they've been described at all in print, they've been described in Russian, which means only editors with a working knowledge of Russian can use those sources to write articles. A new contributor can help by finding a language we don't have enough info on and going to the library!
Next week, we'll have a spot of tea with Wikipedia's editor support group. Until then, introduce yourself to the archive.
Discuss this story